Approximate Sampling Formulas for General Finite-alleles Models of Mutation.
نویسندگان
چکیده
Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulas for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four, respectively. It is demonstrated empirically that the formulas derived here are highly accurate when the per-base mutation rate is low, which holds for many biological organisms.
منابع مشابه
Approximate Sampling Formulae for General Finite-alleles Models of Mutation
Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation, such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for ...
متن کاملClosed-form two-locus sampling distributions: accuracy and universality.
Sampling distributions play an important role in population genetics analyses, but closed-form sampling formulas are generally intractable to obtain. In the presence of recombination, there is no known closed-form sampling formula that holds for an arbitrary recombination rate. However, we recently showed that it is possible to obtain useful closed-form sampling formulas when the population-sca...
متن کاملA principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination.
The multilocus conditional sampling distribution (CSD) describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. The CSD has a wide range of applications in both computational biology and population genomics analysis, including phasing genotype data into haplotype data, imputing missing data, estimat...
متن کاملMaintenance of genetic variability under the pressure of neutral and deleterious mutations in a finite population.
In order to assess the effect of deleterious mutations on various measures of genic variation, approximate formulas have been developed for the frequency spectrum, the mean number of alleles in a sample, and the mean homozygosity; in some particular cases, exact formulas have been obtained. The assumptions made are that two classes of mutations exist, neutral and deleterious, and that selection...
متن کاملPerformance Analysis of Device to Device Communications Overlaying/Underlaying Cellular Network
Minimizing the outage probability and maximizing throughput are two important aspects in device to device (D2D) communications, which are greatly related to each other. In this paper, first, the exact formulas of the outage probability for D2D communications underlaying or overlaying cellular network are derived which jointly experience Additive White Gaussian Noise (AWGN) and Rayleigh multipat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Advances in applied probability
دوره 44 2 شماره
صفحات -
تاریخ انتشار 2012